model selection process
Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects
Tavares, Cristina, Nascimento, Nathalia, Alencar, Paulo, Cowan, Donald
Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and non-functional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.
Training, validation and testing for supervised machine learning models
Validating and testing our supervised machine learning models is essential to ensuring that they generalize well. SAS Viya makes it easy to train, validate, and test our machine learning models. Training data are used to fit each model. Training a model involves using an algorithm to determine model parameters (e.g., weights) or other logic to map inputs (independent variables) to a target (dependent variable). Model fitting can also include input variable (feature) selection.
Automating the machine learning model selection process
The traditional machine learning model selection process is largely iterative with data scientists searching for the best model and the best hyperparameters to fit a given data-set. Going with the philosophy I've learnt from the fast.ai This blog is an introduction to the process and a more comprehensive example can be found here. The intended audience are data analysts learning data science with a few weeks of python experience with a basic understanding of numpy and pandas. For new learners, this can serve to learn the process using a top down approach to learning.
Yellowbrick: Machine Learning Visualization -- yellowbrick 0.7 documentation
Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines scikit-learn with matplotlib in the best tradition of the scikit-learn documentation, but to produce visualizations for your models! For more on Yellowbrick, please see the About. If you're new to Yellowbrick, checkout the Quick Start or skip ahead to the Model Selection Tutorial. Yellowbrick is a rich library with many Visualizers being added on a regular basis.
Train on Validation: Squeezing the Data Lemon
Tennenholtz, Guy, Zahavy, Tom, Mannor, Shie
Model selection on validation data is an essential step in machine learning. While the mixing of data between training and validation is considered taboo, practitioners often violate it to increase performance. Here, we offer a simple, practical method for using the validation set for training, which allows for a continuous, controlled trade-off between performance and overfitting of model selection. We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process. We then prove that stable algorithms are also validation stable. Finally, we demonstrate our method on the MNIST and CIFAR-10 datasets using stable algorithms as well as state-of-the-art neural networks. Our results show significant increase in test performance with a minor trade-off in bias admitted to the model selection process.
- Asia > Middle East > Israel (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)